AITopics

arXiv.org Artificial IntelligenceAug-14-2025

Verify Distributed Deep Learning Model Implementation Refinement with Iterative Relation Inference

Wang, Zhanghan, Ding, Ding, Zhu, Hang, Lin, Haibin, Panda, Aurojit

Distributed machine learning training and inference is common today because today's large models require more memory and compute than can be provided by a single GPU. Distributed models are generally produced by programmers who take a sequential model specification and apply several distribution strategies to distribute state and computation across GPUs. Unfortunately, bugs can be introduced in the process, and a distributed model implementation's outputs might differ from the sequential model's outputs. In this paper, we describe an approach to statically identify such bugs by checking model refinement, that is, can the sequential model's outputs be reconstructed from the distributed model's outputs? Our approach, implemented in GraphGuard, uses iterative rewriting to prove model refinement. Our approach can scale to today's large models and deployments: we evaluate it using GPT and Llama-3. Further, it provides actionable output that aids in bug localization.

artificial intelligence, machine learning, natural language, (19 more...)

2508.09505

Country: North America > United States (0.46)

Genre: Research Report (0.51)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Padigela, Harshith, Shah, Chintan, Juyal, Dinkar

ML-Dev-Bench: Comparative Analysis of AI Agents on ML development workflows

arXiv.org Artificial IntelligenceFeb-15-2025

In this report, we present ML-Dev-Bench, a benchmark aimed at testing agentic capabilities on applied Machine Learning development tasks. While existing benchmarks focus on isolated coding tasks or Kaggle-style competitions, ML-Dev-Bench tests agents' ability to handle the full complexity of ML development workflows. The benchmark assesses performance across critical aspects including dataset handling, model training, improving existing models, debugging, and API integration with popular ML tools. We evaluate three agents -- ReAct, Openhands, and AIDE -- on a diverse set of 30 tasks, providing insights into their strengths and limitations in handling practical ML development challenges.

large language model, machine learning, natural language, (18 more...)

2502.00964

Genre: Workflow (0.85)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.66)

#artificialintelligenceOct-1-2022, 17:45:33 GMT

TODOs for Effective ML teamwork at an early-stage startup - Machine Learns

Abstracting ML code sacrifices expressiveness, increases coupling, and aggravates maintenance. These might be ok for regular software. But things are different for ML. I am sure you know how it feels to waste hours trying to match the API when you want to implement an ML trick. APIs and abstractions are bad for fast-paced ML R&D. ML is too fast, and any API is outdated from its inception. We see a similar pattern with well-known ML libraries (Transition from Theano - Tensorflow - PyTorch - JAX…).

early-stage startup, experiment, library, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

#artificialintelligenceJul-28-2022, 12:12:50 GMT

Performance at Scale: Graphcore's Latest MLPerf Training Results

Graphcore's latest submission to MLPerf demonstrates two things very clearly – our IPU systems are getting larger and more efficient, and our software maturity means they are also getting faster and easier to use. Software optimisation continues to deliver significant performance gains, with our IPU-POD16 now outperforming Nvidia's DGX A100 for computer vision model, ResNet-50. Training ResNet-50 takes 28.3 minutes on the IPU-POD16, compared to 29.1 minutes for DGX A100 – a performance improvement of 24% since our first submission through software alone. It is a significant milestone, given that ResNet-50 has traditionally been a showpiece model for GPUs. Our software-driven performance gain for ResNet-50 on the IPU-POD64 was even greater at 41%.

graphcore, host server, latest mlperf training result, (12 more...)

Industry: Information Technology (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (0.73)
Information Technology > Artificial Intelligence > Natural Language (0.56)

#artificialintelligenceApr-29-2022, 09:56:13 GMT

Machine Learning Top 5 Models Implementation "A-Z"

I have worked with IBM, Cisco, EMC-RSA and others, and I have been an academics for a couple of years. I worked in four continents and travelled extensively. I have a PhD in Engineering, an MSc in AI and an MBA, i am also a Certified Blockchain Expert.

machine learning, machine learning top 5, model implementation

Genre:

Instructional Material > Online (0.40)
Instructional Material > Course Syllabus & Notes (0.40)

Industry:

Information Technology > Security & Privacy (0.93)
Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Security & Privacy (0.63)
Information Technology > Artificial Intelligence > Machine Learning (0.52)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.40)

arXiv.org Artificial IntelligenceSep-7-2020

TorchKGE: Knowledge Graph Embedding in Python and PyTorch

Boschin, Armand

TorchKGE is a Python module for knowledge graph (KG) embedding relying solely on PyTorch. This package provides researchers and engineers with a clean and efficient API to design and test new models. It features a KG data structure, simple model interfaces and modules for negative sampling and model evaluation. Its main strength is a very fast evaluation module for the link prediction task, a central application of KG embedding. Various KG embedding models are also already implemented. Special attention has been paid to code efficiency and simplicity, documentation and API consistency. It is distributed using PyPI under BSD license. Source code and pointers to documentation and deployment can be found at https://github.com/torchkge-team/torchkge.

artificial intelligence, machine learning, pytorch, (17 more...)

2009.02963

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Italy (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.64)

#artificialintelligenceJun-28-2020, 10:20:10 GMT

Machine Learning Top 5 Models Implementation "A-Z"

One case study, five models from data preprocessing to implementation with Python, with some examples where no coding is required.

artificial intelligence, implementation, machine learning top 5, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.39)

Eck, Bradley, Fusco, Francesco, Gormally, Robert, Purcell, Mark, Tirupathi, Seshu

Scalable Deployment of AI Time-series Models for IoT

arXiv.org Artificial IntelligenceMar-24-2020

IBM Research Castor, a cloud-native system for managing and deploying large numbers of AI time-series models in IoT applications, is described. Modelling code templates, in Python and R, following a typical machine-learning workflow are supported. A knowledge-based approach to managing model and time-series data allows the use of general semantic concepts for expressing feature engineering tasks. Model templates can be programmatically deployed against specific instances of semantic concepts, thus supporting model reuse and automated replication as the IoT application grows. Deployed models are automatically executed in parallel leveraging a serverless cloud computing framework. The complete history of trained model versions and rolling-horizon predictions is persisted, thus enabling full model lineage and traceability. Results from deployments in real-world smart-grid live forecasting applications are reported. Scalability of executing up to tens of thousands of AI modelling tasks is also evaluated.

deployment, model implementation, prediction, (16 more...)

2003.12141

Country:

Europe > Middle East > Cyprus (0.05)
Europe > Switzerland (0.04)
Europe > Germany (0.04)
Europe > Ireland (0.04)

Genre: Research Report (0.50)

Industry:

Energy > Power Industry (1.00)
Information Technology (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

#artificialintelligenceMay-26-2017, 06:54:47 GMT

AT&T - Professional-Data Scientist - Plano, TX

If this job is available in multiple locations, by applying to this position you may be considered for the additional locations in your area DUTIES: Design and implement statistical data quality procedures around new data sources; design and implement processes and layouts for complex, large-scale data sets used for modeling, data mining and research purposes; perform data analysis, statistical modeling and data mining in finance, engineering and science domains; build next generation risk and response models using large scale datasets and data sources; perform credit risk and fraud modeling, campaign segmentation, risk-based pricing, direct marketing strategy, underwriting and collection strategies, model implementation and monitoring; utilize R, Python, Statistics, Machine Learning, Data Mining, clustering, and Regression; perform analysis, concept implementation and create detailed methodology descriptions for data research activities; monitor industry trends in the data science domain and apply them to create innovative solutions; utilize machine learning techniques to build learning models that allow the organization to predict outcomes with business implications; utilize data mining techniques to reveal patterns in the data that have implications on the business decisions. Qualifications REQUIREMENTS: Requires a Master's or foreign equivalent degree in Statistics or Computer Science and three years of experience in the job offered or three years of experience in designing and implementing processes and layouts for complex, large-scale data sets used for modeling, data mining and research purposes; performing data analysis, statistical modeling and data mining in finance, engineering and science domains; building next generation risk and response models using large scale datasets and data sources; performing credit risk and fraud modeling, campaign segmentation, risk-based pricing, direct marketing strategy, underwriting and collection strategies, model implementation and monitoring; utilizing R, Python, Statistics, Machine Learning, Data Mining, clustering, and Regression. AT&T is an Affirmative Action/Equal Opportunity Employer, and we are committed to hiring a diverse and talented workforce.

artificial intelligence, machine learning, professional-data scientist, (12 more...)

Country: North America > United States > Texas > Collin County > Plano (0.40)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)